Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies
نویسندگان
چکیده
Translation studies rely more and more on corpus data to examine specificities of translated texts, that can be translated from different original languages and compared to original texts. In parallel, more and more multilingual corpora are becoming available for various natural language processing tasks. This paper questions the use of these multilingual corpora in translation studies and shows the methodological steps needed in order to obtain more reliably comparable sub-corpora that consist of original and directly translated text only. Various experiments are presented that show the advantage of directional sub-corpora.
منابع مشابه
Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced by Comparable Corpora
Automatically extracting parallel sentence pairs from the multilingual articles available on the Internet can address the data sparsity problem in building multilingual natural language processing applications, especially in machine translation. In this project, we have used an end-to-end siamese bidirectional recurrent neural network to generate parallel sentences from comparable multilingual ...
متن کاملاستخراج پیکره موازی از اسناد قابلمقایسه برای بهبود کیفیت ترجمه در سیستمهای ترجمه ماشینی
Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...
متن کاملReporting Preliminary Automatic Comparable Corpora Compilation Results
Translation and translation studies rely heavily on distinctive text resources, such as comparable corpora. Comparable corpora gather greater diversity of language-dependent phrases in comparison to multilingual electronic dictionaries or parallel corpora; and present a robust language resource. Therefore, we see comparable corpora compilation as impending in this technological era and suggest ...
متن کاملParallel and comparable corpora: what are they up to?
With ever increasing international exchange and accelerated globalisation, translation and contrastive studies are more popular than ever. As part of this new wave of research on translation and contrastive studies, corpora, and multilingual corpora in particular, have a prominent role. In this chapter, we will illustrate the value of parallel and comparable corpora to translation and contrasti...
متن کاملCreating a Persian-English Comparable Corpus
Multilingual corpora are valuable resources for cross-language information retrieval and are available in many language pairs. However the Persian language does not have rich multilingual resources due to some of its special features and difficulties in constructing the corpora. In this study, we build a Persian-English comparable corpus from two independent news collections: BBC News in Englis...
متن کامل